Architectures for Speech Synthesis from Human Voice Audio Database
نویسندگان
چکیده
This paper describes some of the ongoing work carried out within the NLP group of Otago University for speech synthesis from diphone audio databases which are prepared from human voice recording. A speech synthesis system based on such databases claims to have human level performance provided enough descriptions are given to extract and manipulate voice data. Based on the software provided by the same project developing the databases [1][2][3], many control parameters are accommodated for preparing for a text file as a directive for speech synthesis. Thus, for a speech system with detailed phonetic descriptions stored in dictionary entries, synthesizing speech for words is a fairly straightforward task. Our work however aims at building a text-to-speech transducer without a dictionary, so that no limitations can be put to the kind of words it can process. Instead, we use a neural network as a learning system for storing text-to-speech correspondence knowledge for the task. In this paper we will introduce the training parameters and encoding method used for high-quality speech synthesizing and a text-to-speech system as a whole. In particular, we will introduce work currently being carried out for synthesizing speech of French words, based on one of the voice databases provided by the MBROLA project [3].
منابع مشابه
A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملEmotional Speech Synthesis
Emotional speech synthesis is an important part of the puzzle on the long way to human-like artificial human-machine interaction. During the way, lots of stations like emotional audio messages or believable characters in gaming will be reached. This chapter discusses technical aspects of emotional speech synthesis, shows practical application and highlights new developments concerning the reali...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملImprovements to a Sample-Concatenation Based Singing Voice Synthesizer
This paper describes recent improvements to our singing voice synthesizer based on concatenation and transformation of audio samples using spectral models. Improvements include firstly robust automation of previous singer database creation process, a lengthy and tedious task which involved recording scripts generation, studio sessions, audio editing, spectral analysis, and phonetic based segmen...
متن کاملAudio Morphing
Approach: There are two variants of our work: inter-voice morphing and intra-voice morphing. In the intra-voice morphing scenario, a single person’s voice is recorded uttering a wide range of utterances. The speaker’s phones are then morphed in time to generate new utterances of the speaker. We note that intra-voice morphing addresses the same problem that concatenative speech synthesis algorit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007